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1. Introduction 

Suppose we have a non-constructive proof: a proof that some quantity 
exists but which does not actually tell us what the value is. Such a thing 
might happen because the proof proceeds by contradiction, because it uses 
the axiom of choice, or because the proof depends on an abstract technique 
which takes us far afield from the quantity we original set out to proof the 
existence of. 

We might then ask whether there is any constructive proof of the statement— 
a proof which actually calculates the quantities in the statement. The answer 
may be no. Even if the answer is yes, it may require finding a completely 
different proof, which could be as hard as—or harder than—proving the 
theorem in the first place. 

Our topic, however, is circumstances which guarantee that the answer 
is “yes”: when knowing that there is a non-constructive proof guarantees 
the existence of a constructive proof, and further, tells us how to find it. 
Strikingly, at least to those unfamiliar with proof theory, both the circum¬ 
stances and the method for finding a constructive proof are syntactic: they 
depend on the written form of the theorem and its proof. While there exists 
more than one such method, our focus in this paper will be the functional 
interpretation (also called the “Dialectica” interpretation, after the journal 
where it first appeared [T8]). 

If we hold a strongly formal view of mathematics, we can ask for proper 
“meta-theorems”: we could take the view that a proof in mathematics is 
always, at heart, a formal deduction in our favorite system of axioms—say, 
ZFC. Then we could hope for a formal proof that, given any deduction of 
a theorem with the necessary syntactic property, there exists a constructive 
deduction of the same theorem. For many choices of axioms (not currently 
including ZFC, but including systems which suffice to formalize most of 
mathematical practice), such meta-theorems actually exist. 

These theorems are of limited applicability by themselves, however. Even 
if one believes that actual proofs, as written in textbooks and articles, are 
intended as descriptions of formal deductions, obtaining those formal deduc¬ 
tions is an arduous (and tedious) process. The usefulness of the functional 
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interpretation comes from the fact that it can be applied, with a feasible 
amount of effort, to actual proofs as written by and for humans. Of course, 
we cannot have a theorem about “all proofs accepted by mathematicians”; 
in the broader setting we have only a heuristic: the functional interpreta¬ 
tion is a practical method for producing ordinary constructive proofs from 
ordinary non-constructive ones as already written up in journals 0 

As the title promises, our goal in this paper is to introduce this method by 
focusing on a concrete example. In section[2]we’ll present a non-constructive 
proof of a theorem on approximations of Li functions. In section [3] we’ll 
present a corresponding quantitative theorem with a constructive proof. 
These two sections will be completely elementary—all that’s needed is some 
very basic real analysis. (There is nothing new in our analysis of this theo¬ 
rem, which derives from Kohlenbach and Oliva’s work [53]. We have chosen 
this example because the underlying theorem and proof are simple enough 
that they can be discussed in detail in a reasonable space.) 

With a motivating example in place, in section [3] we will finally introduce 
the functional interpretation, illustrating that the example in section [3] is 
an instance of a general method. This section will necessarily involve some 
actual formal logic—we’ll assume some familiarity with ordinary first-order 
logic and the notion of computability, although we will keep the dependence 
as minimal as possible. 

We will conclude with some references to further applications in the liter¬ 
ature. 

The author is grateful to Jeremy Avigad and Ulrich Kohlenbach for helpful 
suggestions on a draft of this paper. 

2. A Theorem of Jackson’s 

Throughout, we will only be concerned with real-valued functions on [0,1], 
which will refer to as just “functions”. When we write J f dx without bounds, 
we always mean the integral /q f dx. 

Definition 2.1. We say / is an Li-function if / |/| dx exists. The Li-norm 
is defined on such functions by 

ll/lli = J\f\dx. 


^Because proof theory is historically tied to intuitionist and formalist philosophical 
views, its dependence on these philosophies is sometimes overstated. One need not believe 
that formal deductions, constructive proofs, or syntax play any special role in mathematics 
for proof theoretic methods to be useful. Even someone who believes that axiomatic proofs 
are artificial constructs of no intrinsic importance should recognize that large swaths of 
mathematics just happen to be formalizable, and therefore that methods derived from 
their study just happen to be useful in practice. Our attitude can be the one attributed 
to Niels Bohr regarding the lucky horseshoe above his door: “I am told it works even if 
you don’t believe in it”. 
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Many bizarre, complicated, or difficult-to-work-with functions are nonethe¬ 
less Li, so it is natural to ask about approximating such functions by nicer 
classes of functions; when the distance is measured using the Li-norm, this 
is known as mean approximation. In particular: 

Definition 2.2. Let / be an Li-function, and let P be a collection of Li- 
functions. A best (mean) approximation to f in V is a function p ^ V such 
that for every q € V, 


||/-p||i < ||/-g||i- 


We will mostly be interested in approximation by polynomials of low 
degree. 

Definition 2.3. We write Vn for the collection of polynomials of degree at 
most re. 

When V is finite dimensional, the existence of a best approximation fol¬ 
lows immediately from a theorem of Riesz |43] . Our focus will be on the 
question of whether the best approximation is unique. For an arbitrary set 
V, it need not be: 

Example 2.4. Let / be the function which is constantly equal to 2, and 
let V consist of all piecewise continuous Li-functions p such that ||p||i < 1. 
Clearly the function p which is constantly equal to 1 is a best approximation 
to / in V, but the function 



2 if0<x<l/2 
0 ifl/2<x<l 


is also a best approximation. 

Uniqueness of approximations is related to a property known as strict 
convexity of the norm (see, for instance, [l2] for more general theory on 
the subject of approximations); since the Li-norm is not strictly convex, 
in general there is not a unique best mean approximation. However for 
particular classes of functions, ad hoc arguments can still give uniqueness, 
and an example of such a result is due to Jackson m- 

Theorem 2.5. Let f be a continuous function. Then for any re, there is a 
unique best approximation to f in Vn- 

There are a number of proofs of this theorem |361l42lH9] . but we will ex¬ 
amine a direct proof (in particular, avoiding the use of measure theory) due 
to Cheney m, which we break into several lemmas to make the subsequent 
analysis easier. (Certain equations are labeled because we will want to refer 
to them later.) 

We need a few definitions. 


Definition 2.6. If / is a continuous function, ||/||oo = sup^-^io,!] \f{x)\- 
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(This is a simplification of the usual L^o norm; we do not need the L^o 
norm for discontinuous functions, and so can get away with this simplified 
definition.) Note that, since / is assumed to be continuous, ||/||oo is defined 
and finite. 

Definition 2.7. If o' is a function, we write sgn g for the sign function, given 
by 

f 1 if g{x) > 0 
sgng{x) = < 0 if g{x) = 0 . 

{ —1 if glx) < 0 

Lemma 2.8. Let g and h be continuous functions such that g has finitely 
many zeros and J h sgn gdx 0. Then for some X, 

lb - A/i||i < Iblli- 

Proof. Let xi,... ,Xk be the zeros of g. We partition the interval [0,1] into 
two sets, A and B. B will consist of a small open interval around each x*: 

B = IJ (xj - e, Xj + e) n [0,1] 

i<k 

while A = [0,1] \ B. We will have to choose e small enough. Observe that 
(1) f \h{x)\dx < Iblloo • A: • 2e, 

JB 

SO by choosing e very small, we may arrange for /g \h{x) \ dx to be as small 
as we need. 

On the other hand, we know that | / h sgngdx\ > 0, and we have 



/ h sgn g dx 
JA 

> 

J h sgn gdx 

- 

/ h sgn g dx 
Jb 

(2) 


> 

J h sgn g dx 

-J 

f \h\ dx. 

B 


In particular, by choosing e small enough that 2 /g |/i(x)| dx < |/ hsgng dx\, 
we may arrange to have 


( 3 ) 



h sgn gdx 


> / \h{x)\dx. 
Jb 


\g\ is continuous, and therefore the restriction of \g\ to A has a minimum 
somewhere on the closed set A, and since g has no zeros in A, this infimum 
m = inf{b(x)| I X G A} must be positive. Choose A so that 0 < |A| • ||/i||oo < 
m and sgn A = sgn (/^/isgn^i dx). Then we have |A/i(x)| <m< b(x)| and 
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sgng = sgn{g — Xh) for every x ^ A. Now we can compute: 

\\g - Xh\\i = J \g- Xh\dx 

= / \g — Xh\dx+ / \g — Xh\dx 
Ja Jb 

= / {g — Xh) sgn g dx + / \g — Xh\dx 
Ja Jb 

= / \g\dx — X hsgngdx+ / \g — Xh\dx 
Ja Ja Jb 

= \g\dx— \g\dx — X / hsgngdx+ / \g — Xh\dx 
J Jb Ja Jb 

< \g\dx — X hsgngdx+ / |( 7 | + |A| — / \g\dx 

J Ja Jb Jb 


(4) 


= / l^ldx —|A| / hsgngdx + |A| / \h\dx 
J Ja Jb 


= J \ 9 \dx - |A|( 
< J \ 9 \dx 


h sgn g dx 


— [ \h\dx) 
Jb 


= S' 1 


□ 


Lemma 2.9. If f is continuous and p is a best approximation to f in Vn 
then f — p has at least n + 1 zeros. 

Proof. Let g{x) = f{x) — p{x). Suppose the statement is false, so g has at 
most n zeros. Since /, and therefore g, is continuous, there are at most n 
points where g changes sign. For some m < n, we may choose points 

0 < ri < r2 < • • • < Tm < 1 

so that the are exactly the interior points where g changes sign. Consider 
the polynomial 

m 

Hx)= 

i=l 

Since h{x) also changes sign exactly at the r^, it follows that h sgn g is either 
always non-negative or always non-positive, and is not constantly 0, and 
therefore 

(5) J hsgngdx^O. 

By Lemma 1^151 there is a A such that ||g( —A/i||i < and equivalently, 

II/- {p + Xh)\\i = \\g-Xh\\i < ||5(||i. 

But p+Xh is a polynomial of degree at most n, contradicting the assumption 
that p was a best approximation. □ 
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In order to derive ([5]), we used the following seemingly trivial fact, which 
we will need again: 

Lemma 2.10. Let q be a continuous function with J\q{x)\dx = 0. Then 
Ikiloo = 0. 

Theorem 2.11. Let f be a continuous function. Then for any n, there is 
a unique best approximation to f in Vn ■ 

Proof. Suppose pi,P2 are both best approximations of / by polynomials of 
degree at most n. Let p be the average, p{x) = ^{pi{x) +p2{x)). Clearly p 
is also a polynomial of degree at most n. Also 

Il/-Plli = J \f{x) -pix)\dx 

= J \f{x)-^{piix)+P2{x))\dx 

= ^11/ -Pllll + ^11/ -P2||l. 

This means that p is also a best approximation of /. 

Since pi and p 2 are both best approximations, we have ||/—p||i = ||/ — 
Pilli = ll/“P 2 ||i, and therefore 

0 = ^Il/-Pi||i + ^11/ -P 2 II 1 - 11/ -p||i 

(6) = J ^\f{x) - Pi{x)\ + ^|/(x) -P2{x)\ - \f{x)-p{x)\dx. 

On the other hand, by the triangle inequality, for every x, 

\f{x)-p{x)\ = \f{x) - ^{pi{x) +P2{x))\ 

= - Pl{x)) + ^{f{x) - P2{x))\ 

< \\f{x) -Pl{x)\ + ^\f{x) -P2{x)\. 

In particular, 

(7) ^\fix)-pi{x)\ + ^\f{x)-p2ix)\ -\fix)-p{x)\ > 0 
for every x. 

Combining Q with ([T]) using Lemma I2.1UI for all x 

^\fix) - Piix)\ + ^\f{x) - P2{x)\ - \f{x)-p{x)\ = 0. 

Therefore for every x, 

\f{x)-p{x)\ = ^\f{x)-pi{x)\ + i|/(x) -P2{x)\. 
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Since p is a best approximation to /, by Lemma YIM f — p must have at 
least n + 1 zeros. 

Suppose r is a zero of f — p. Then 

0 = \f{r)-p{r)\ = ^\f{r)-piir)\ + ^\f{r) - P 2 {r)\ 
and therefore 

^\fir)-Pi{r)\ = ^\f{r)-P 2 {r)\ = 0. 

Then pi{r) = /(r) = P 2 {r), and therefore pi{r) — P 2 {r) = 0. pi — p 2 is a 
polynomial of degree at most n which is 0 at every zero of / — p. The only 
polynomial of degree at most n which has n + 1 zeros is 0, so pi — p 2 must 
be constantly 0, and therefore pi = P 2 - D 

3. Rates of Unicity 

Not all uniqueness results are created equal. Once we know there is a 
unique approximation, we can ask for more detailed quantitative information 
about the approximation. 

Definition 3.1. p ^ Vn is a e-nearly best approximation to f in Vn if for 
every q € Vn, 

Il/-Plli < II/-9II1 + e- 

We know that two best approximations in Vn must be equal, so the anal¬ 
ogous question to ask is whether e-nearly best approximations must be near 
each other. More precisely, we look for a function <l>j such that if pi and p 2 
are <h/((5)-nearly best approximations then ||pi — P 2 II 1 < Such a function 
is known as a modulus of uniqueness for /. 

It’s clear from the definition that if there is any modulus of uniqueness 
then there are many functions meeting the definition (for instance, if <I>j is 
a modulus of uniqueness for / and 'L(5) > for every 5 then T is also a 

modulus of uniqueness for /). However if there is a unique best approxima¬ 
tion to /, we would expect that there is a “nice” modulus of uniqueness— 
one that is continuous, at least from one side, and where lim^-^o ^fi^) = 0- 
The optimal modulus has these properties under suitable conditions, and 
is sometimes called "the" modulus of uniqueness. An arbitrary modulus of 
uniqueness, however, need not be so nice. The particular case where c 6 is 
a modulus of uniqueness for some constant c is called strong unieity, and is 
well-studied in approximation theory (again, see [T^l. 

A second type of quantitative information is the stability of the mean 
approximation under small changes to the function. Let us write A for the 
functional mapping / to its best approximation in V. When ||/ — /^||i < 
e, what can we say about \\A{f) — M(/')||i? A modulus of uniqueness 
immediately answers this question. 

Theorem 3.2. Suppose f,f' have unique mean approximations in V, is 
the modulus of uniqueness for f, and \\f — f'\\i < ^<h/(h). Then ||M(/) — 
Aif')\\i<6. 
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Proof. Suppose ||/ — /'||i < For any q €V, 

ll/-^(/0lli<ll/-/1li + ll/'-^(/')lli 
<11/-/' 111 + 11/'-gill 

<2||/-/'||i + ||/-g||i 

<||/-^||i + c^K^)- 

Therefore A{f') is a <hj(5)-nearly best approximation to /, and since <h/ is 
a modulus of uniqueness, 

||^(/)-^(/')||i< + 

□ 

Our goal in the remainder of this section is to find a modulus of uniqueness 
corresponding to Jackson’s Theorem, closely following [53]. More precisely, 
we want to find a modulus corresponding to the particular proof given in the 
previous section—we think of this as the process of making our arguments 
more quantitative: instead of working with best approximations, we’ll work 
with nearly best approximations; instead of working with non-zero values, 
we’ll include numerical bounds on how large the values must be; and so on. 

We begin with a simple (but, as it turns out, essential) observation: if 
IIpIIi > 2||/||i then ||/ — 0||i < ||/ — p||i, and therefore when consider¬ 
ing approximations, we can consider only those polynomials which satisfy 

IHIi<2||/||i. 

Definition 3.3. C Vn is the set of polynomials p of degree < n such 
that IIpIIi < 2||/||i. 

In particular, the best approximation to / must belong to Qn- From here 
on we will restrict our attention to Q„. 

3.1. A Quantitative Lemma 12.81 We can expect that in order to obtain 
a quantitative version of the main theorem, we’ll need quantitative versions 
of the intermediate lemmas as well. We’ll start, naturally, with Lemma 12.81 
we need to strengthen the conclusion so that 11^ — Ah| |i -|- J < 11^1 |i for some 
fixed value 6 . 

Of course, the stronger conclusion isn’t true unless we make stronger 
assumptions as well—in order to get a quantitative conclusion, we’ll need 
quantitative assumptions. We look through the proof to find out which steps 
contribute to the size of the gap between ||g'||i and ||(7 —A/i||i, and then take 
steps to bound those values. 

The actual guarantee that the gap exists is given by (j!]), where we can 
see that the size of the gap is A(|/^/isgn^f (ix| — /g \h\dx). This gives us 
two values we need to bound away from 0: A and |/^ hsgng dx\ — /g \h\ dx. 
The latter is shown to be positive at (|3|); from this and (l2|), we can see that 
there are two factors contributing to the size of \J^hsgn g dx\ — /g \h\dx: 
the size of |/ hsgngdx\ and the size of /g \h{x)\dx. The first of these is 
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simply one of our assumptions: instead of making the qualitative assump¬ 
tion that / h sgn g dx ^ 0, we will make a quantitative assumption that 
1/ h sgn gdx\ > 6 for some appropriate 9. 

We turn to the bound on /g \h{x) \ dx. The bound on this value is given 
by (P), where it is established that 



A bound on ||/i||oo is just an additional assumption, as is a particular value 
for k, but e is some value we have chosen. As long as we are willing to make e 
small, we can make /g \h{x) \ dx as small as we want. So as we make e smaller, 
we make /g \h{x)\ dx smaller, and therefore seem to make ||( 5 :||i — \\g — A/i||i 
larger. However when we turn to A, we will see that making e small may 
force A to be small, which makes H^Hi — \ \g — A/i||i small. 

A was given by the rule that A||/i||oo < m = inf{|( 7 (x)| | x G A}, so to 
keep A bounded away from 0, we need to bound m away from 0. But here 
we run into a problem: what if g has a point which dips very close to 0, but 
isn’t near any of the zeros? Then this point is included in the set A, but it 
forces m to be very small, which in turn forces A to be small, and therefore 
causes — A/i||i to be very close to || 5 ||i. 

Suppose is a function which “almost” has a zero—say, = (x — 

1/2)^ -|- where C is very small. If we take the proof given in the previous 
section at face value, C being small just forces us to choose A^ very small. 
Indeed, following this argument, as C approaches 0, the gap ||( 7 ^||i — — 

A^/i||i approaches 0 as well. 

When reaches 0, however, the situation changes. Now 1/2 is a zero of 
5 o(x), and A only has to be concerned with the value of go{x) when x is 
outside the interval (1/2 — e, 1/2 -|- e). In particular, we can still choose a 
value Ao so that H^oHi “ ||5o “ Ao/i||i > 0. It turns out to be a general 
principle that quantitative results ought not exhibit such discontinuities— 
all parameters in our proof should vary continuously in the function g. It 
is therefore not surprising that we can easily modify our proof to eliminate 
this discontinuity: when is very small, we could make ||5(^||i — Ibf — A^/i||i 
larger by treating 1/2 as if it were a zero and removing the interval around 
it. This has a price (it makes k larger, because we have a new zero, which 
in turn makes /g|/i(x)|dx larger, and therefore IJ^^hsgng dx\ — J^\h\dx 
smaller), but when ^ is very small, we still obtain better bounds this way. 

Instead of assuming that g has finitely many zeros, we’ll assume there are 
finitely many points where the value is below some parameter ^ (with the 
number of such points and the size of contributing to our ultimate bound 
on || 5 f||i — ll^f — A/i||i). Stated like this, we still don’t have the formulation 
quite right, since when g is continuous, having any zeros at all will lead to 
having uncountably many points below clearly having many small points 
“close togther” doesn’t count. We will remove a small interval around each 
almost-zero, so what matters—what forces the set B to be large—is if there 
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are many points close to 0 which are separated by large gaps. It is more 
convenient to jump to our ultimate purpose, and simply require that there 
is a set B of measure at most e containing all points r with |( 7 (r)| < 

We can now prove this quantitative formulation using essentially the same 
argument as in the proof of Lemma 12.81 but filling in explicit calculations 
where appropriate. 


Lemma 3.4. Let g, h, e, ( be given such that: 

• g and h are Li-functions, 

• e > 0 and C > 0, 

• \ J hsgng dx\ > 3Ke, 


Then 


oo < K, 

There is a set B of measure at most e such that if \g{x)\ < C then 
X & B. 

Wa - + 4 - 


Proof. Let A = [0,1] \ B. We have 

[ \h\ dx < Ke. 
Jb 

We have 


h sgn gdx 


> 


h sgn gdx 


dx > 2Ke 


and therefore 


h sgn gdx 


dx > Ke. 


IB 


Note that ii x ^ A then |fl'(x)| > C- Let A = ^. Then for any x € A, 

\Xh{x)\ < I < C < lfl'(3^)l- III particular, sgn 5 f(x) = sgn( 5 f(x) — \h{x)) for 
every x £ A. Then, by the same calculations as in the proof of Lemma 12.81 


\\g - Ah||i < ll^lli - A 


h sgn gdx 


- \h{x)\dx^ < ll^lli - e^. 


□ 


3.2. A Quantitative Lemma 12.91 We now turn to formulating a quan¬ 
titative version of Lemma ESI We would expect to need a quantitative 
version of Lemma 12.101 to derive ([5]). In fact, a particularly nice version 
which suffices for our purposes is available in the existing literature: 

Theorem 3.5 (Markov brothers’ InequalitjH) . Ifp has degree < n, Halloo < 
2n‘^\\p\\oo- 

^This inequality is sometimes called just “Markov’s inequality”, but as that name is 
also used for an inequality from probability theory, this name is perhaps less confusing. 
The inequality given here was proven by Audrey Markov, the same one who gave his name 
to the other Markov’s inequality. His younger brother Vladimir, also a mathematician, 
proved a generalization to multiple derivatives, leading to the name given here, which, 
properly, applies only to the generalization. 
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An immediate consequence of this is 

Lemma 3.6. If p has degree < n, ||p||oo < 2(n + l)^||p||i. 

Proof. Let q{x) = fg p{y) dy be the integral of p. Applying Markov’s in¬ 
equality to q gives 


Ibiloo < 2(n-k l)^||g||oo 


= 2(n -|- 1)^ sup 

X 

< 2(n -|- 1)^ sup 

X 



p{y) dy 



\p{y)\dy 


= 2 {n + lf f \p{y)\dy 

Jo 

= 2(n-hl)^|b||i. 


□ 

As for stating a quantitative version of Lemma [2.9[ our work in the previ¬ 
ous subsection gives us a good guess what will need to happen: we will need 
to replace the assumption that p is a best approximation with the assump¬ 
tion that p is a nearly best approximation, and the conclusion that f —p has 
n zeros with the conclusion that f — p has a collection of n “almost” zeros 
which are well spread out. 

Lemma 3.7. Suppose f is continuous and p is a 2 o(n+ 2 )^ -nearly best ap¬ 
proximation to f in Qn- Then there are n + 1 points ri < ■ ■ ■ < r^+i such 
that Tj+i - Tj > 20 {n+ 2 pn fori<n + l and \f{ri) - p{ri)\ < f for each 
i < n + 1. 

Proof. Let 9 = 2 o(n+ 2 pn ^ ^ = io{n+ 2 )-^ ’ 9 {x) = /(®) -p{x). Suppose the 

conclusion is false; then there must be some k < n and some < • • • < 
such that |5(ri)l < C each i < k, but whenever |( 7 (r)| < for any 
r € [0,1], there is some i < k such that |r — r^| < 6 . In particular, the set 
B = + d) contains all points r with |fl'(r)| < C. Observe that 

p{B) < 2ne = 10(42)^ = e- 

If -|- 0 < — 9 then, since g{x) is continuous, g does not change 

sign on the interval [r[ -\- 9, r[j^^ — 9]. So all sign changes take place in some 
interval {r'^ — 9, r[ + 9). We choose a subsequence ri,... ,rm and consider 
the polynomial 

m 

d'{x) = ]^(x - ri). 
i=l 

By choosing the subsequence appropriately, we can ensure that h{x) = 
{-ifh' (x) has the same sign as g(x) whenever x ^ B. (Saying this precisely 
is tricky: roughly, we want to include in our list if sgn g{r[ — 9)^ sgn g[r^ + 
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9), but the situation is complicated by the fact that we could have — 9< 
+ 6) and have the sign change occur somewhere in that interval; in that 
case we would want to include either of and but not both.) Let 

K = ||/i||oo; by LemmaESl ||/i||i > 2(n+i)^ • 

On the other hand, we have 


[ hsgngdx < [ \h{x)\dx < Kfi{B) = Ke, 
Jb Jb 


h sgn gdx 

and, taking A = [0,1] \ B, 

ll^lli = / \h{x)\dx = [ \h{x)\dx+ [ \h{x)\dx 
J JA JB 

Putting these together gives 


\h{x) \ dx > 


K 


K 


2K 


2(n + l)2 10(n + l)2 5(n + l)2' 


Therefore 


h sgn gdx 


hsgngdx — / \h{x)\dx 
4 JB 

= [ \h{x)\dx — [ \h{x)\dx 
JA JB 


> 


2K 


K 


5(n + l)2 10(n + l)2 

3K 


10(n + 1)2 

> 3Ke. 


Therefore we may apply Lemma 13.41 and conclude that there is a A so 
that \\g — A/i||i + e| < ||5||i, contradicting the assumption that p was a 
2 o(n+i)^ -nearly best approximation. □ 

3.3. A Quantitative Lemma 12.101 We now turn to Lemma 12.101 We re¬ 
placed one use of this lemma with the Markov brothers’ inequality, but there 
is a second use, in the proof of the main theorem, which cannot be replaced 
by the Markov brothers’ inequality (since that only applies to polynomials). 
In its qualitative form, this lemma seems like an obvious fact about inte¬ 
grals, but we now need a quantitative version. A quantitative formulation 
should weaken the assumption to merely / |( 7 (x)| dx < e for some bound e. 
Of course, we can no longer hope to have q{x) = 0 for every x, since we 
can easily think of continuous functions which have small but non-zero inte¬ 
grals but are not always 0. Therefore we expect to weaken the conclusion as 
well—instead of having ||g||oo = 0, we might hope to show that ||g||oo < h 
for some value 5. 

We can easily imagine continuous functions q with ||g||i small but ||(?||oo 
quite large: q could be 0 except on a small “bump” where it gets very large. 
By making the bump very narrow, we can let the bump be very tall, causing 
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Halloo to be large even though ||g||i is staying small. These bumps don’t 
create discontinuity, but it’s not unreasonable to say that a function with 
a tall, thin bump is “less” continuous than functions without such features. 
We can quantify this effect: 

Definition 3.8. Let g be a continuous Li function. A modulus of uniform 
continuity of g is a function u]q{e) such that for any e > 0 and any x, y S [0,1] 
such that \y — x\ < u}q{e), \q{y) — y(x)| < e. 

It’s not hard to see that the modulus of uniform continuity is sufficient 
to give a quantitative version of the lemma: if \q{x)\ > e for some x, the 
modulus of uniform continuity ensures that |y(y)| > e/2 when y is near x, 
and this ensure that / \q{x) \ dx cannot be too small. To make this precise: 

Lemma 3.9. Let q be a function with modulus of continuity ojq and 
J \q\dx <y min{i,Wq(e/2)}. 

Then ||(7||cxd < e. 

Proof. We prove the contrapositive. If |y(x)| > e then for every y with 
|y — x| < Wq(e/2), |g'(y)| > e/2. We cannot be sure the whole interval 
{x — ujq{e/2),x + ujq{e/2)) belongs to [0,1], but setting rj = min{^, Wg(e/2)}, 
we can be sure that if x < 1/2 then (x, x + y) C [0,1], while if x > 1/2 then 
(x — rj,x) C [0,1]. The two cases are symmetric, so consider the case where 
X < 1/2, and therefore 



> 0 + (e/2)y + 0 

> {e/2)rj. 


□ 

Now when we apply Lemma 13.91 we need to know the modulus of conti¬ 
nuity for the function we apply the lemma to. The only application in our 
proof not covered by the Markov brothers’ inequality is to the function 

^\f{x)-pi{x)\ + ^\f{x)-p 2 ix)\ - |/(x) -p(x)|. 

/ is a function given to us; it is no surprise that our bounds will depend 
on modulus of continuity of /, so we’ll include as an assnmption that we are 
given particular bounds on the modulus of continuity of /. 

For the polynomials pi,P 2 , we can use a combination of the Markov broth¬ 
ers’ inequality and the fact that we restricted our polynomials to those in 
Qn to obtain a modulus of continuity. (This is where we use the fact that 
we are optimizing over instead of 

Lemma 3.10. Let p € Qn- Then ojp{e) = is a modulus of 

uniform continuity for p. 
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Proof, p is differentiable, so for any x < y in [0,1], 
\p{y)-p{x)\< / 



oo • 


X 


Applying the Markov brothers’ inequality to p', we have 


Ib'lloo < 2n^||p||oo < 2n^(n + l)^|b||i < 4n‘^{n + l)^||/||i. 
Therefore for any x < y, 


\p{y) - p{x)\ < 4re^(n + lf\\f\\i{y - x) 


and so the function 0 Jp{e) = 4 n^[n+\)'^\\f\\i ^ niodulus of uniform continuity 


for p. 


□ 


The function we ultimately need is a linear combination of absolute values 
of functions. It suffices to observe the following: 

Lemma 3.11. Let f,g be functions, let c be a constant, and let ojf,ojg he 
corresponding moduli of uniform continuity. Then: 

(1) Uf is a modulus of uniform continuity for |/(x)|, 

(2) a;c/(e) =a;/(e/|c|) is a modulus of uniform continuity for cf{x), 

(3) a;/+g(e) = min{wj(e/2), cug(e/2)} is a modulus of uniform continuity 


for f{x) +g{x). 

Proof. (1) Suppose |y - x| < uif{e). Then ||/(y)| - |/(x)|l < \f{y) - 
f{x)\ < €. 

(2) Suppose |y-x| < w/(e/c). Then |c/(y)-c/(x)| = |c| • |/(y)-/(x)| < 
|c|e/|c| = e 

(3) Suppose |x — y| < min{a;j(e/2),a;g(e/2)}. Then 

\{f + 9)ix) - if + g){y)\ < \ fix) - f{y)\ + \g{x) - g{y)\ < e/2 + e/2 = e. 


□ 


By the same argument, we may replace the last item in the lemma with 
a;j+g+/i(e) = mm{ujj{e/3), C 0 g{e/3), L 0 fi{e/3)} for the sume of three functions. 

3.4. A Quantitative Version of Jackson’s Theorem. Finally, we turn 
to the main theorem, Theorem 12.111 We know that we need to strengthen 
the assumption by stipulating a modulus of continuity for the function /, 
and that in exchange we expect to calculate a modulus of uniqueness for the 
approximation of / in Q„. 

Once again, we go through the proof systematically adding quantitative 
information to statements. Suppose that instead of being best approxima¬ 
tions, Pi and p 2 are e-nearly best approximations. Our goal is to obtain 
some kind of bound on ||pi — P 2 ||i- The function p{x) = ^{pi{x) +P 2 {x)) is 
no longer a best approximation, but it is still an e-nearly best approximation. 
As a result, the function 


q{x) = ^\f{x) - pi{x)\ + ^\f{x) - P 2 {x)\ - \ fix)-p{x)\ 
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is still non-negative, and has a small (but not necessarily 0) integral. 

The main difference is that we will ultimately obtain points ri, ... ,rn-i-i 
with the property that \pi{r) — P 2 {f)\ is small, but not necessarily 0 . In the 
qualitative version, we used the presence of re + 1 zeros, and the fact that 
Pi — p2 was a polynomial of degree at most re, to conclude that pi — p 2 was 
constantly 0 . A quantitative version should allow us to conclude from the 
presence of re -|- 1 “well-separated almost-zeros” that pi — p2 is close to the 
constantly 0 polynomial. 

We know that re -|- 1 points are enough to specify a polynomial of degree 
at most re, so if we write down any polynomial of degree at most re going 
through all the points {ri,pi{ri) — P2{'fi))i we will have written down the 
polynomial pi — P2- We just need to express this polynomial in a way that 
makes explicit that the polynomial is always small. As it happens, one of 
the most common ways of writing down a polynomial from its zeros has 
precisely this property: the Lagrange interpolant is given by 

n+l 

Mx) = (piiri) - p2iri)) n ^ • 

i=i j^i 

Observe that for any k, for each i k the term Wj^i is 0 , and therefore 

L{rk) = {pi{rk) - P2{rk)) H — - — = Pii^k) - P2{rk), 

so the Lagrange interpolant does goes through the desired points. It is also 
clear that we have retained exactly enough information about the points 
{ri,pi{ri) — P2{ri)) to place a bound on ||L||oo: we know that each pi(rj) — 
P 2 i'i^i) is small and that as long as j ^ i, Vj — r* is not too small. 

Stated formally, we have the following theorem: 

Theorem 3.12. Let f be a function with modulus of continuity Uf. Then 
for any re, 

e) = + 24 re 2 (re + l) 2 ||/||i ^ 

where C = 4 ,{n+i) 20 ^{n+ 2 )‘^^n^ ’ ® modulus of uniqueness for the approxima¬ 

tion by Qn- 

Proof We fix some values: 7 = 207^7^, v = ■^, C = and p = 

2 10 (n+ 2 )^ ’ ‘^/(C/i- 2 ), 24 n^(n+l)^||/||i 

Let pi,P 2 be p-nearly best approximations to / in Let p{x) = 

\{Pi{x) -\-p 2 {x))- p is also an p-nearly best approximation to / since 

11/ - pill < ^11/ -Pilli + ^11/ -P2||i. 

This means the equation ([B]) gets weakened to 
/■ 1, ..1 
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But ([7]), 

^\fix) - Piix)\ + ^\f{x) - P 2 {x)\ - \f{x)-p{x)\ > 0 , 

is still valid. 

Let 

q{x) = ^\f{x)-Pi{x)\ + ^\f{x)-P 2 {x)\-\f{x)-p{x)\. 

We wish to apply Lemma 13.91 which requires a modulus of continuity. We 
have the modulus Uf for / and the modulus 0Jp{5) = for p,pi, 

and p 2 , so we may apply Lemma 13.111 to obtain the modulus of continuity 

U,(S) = min{u,KV6), + i)2||;||d - 

In particular, p < | min{l/2,a;g(C/2)}, so by Lemma (3.91 we must have 
I l^l loo < C- 

p < 2 o(J+ 2 )^ ’ Lemma [3771 we have n + 1 points ri < • • • < r^+i such 
that Tj+i — Tj > 7 for i < n + 1 and |/(rj) — p(rj)| < ( for each i < n + 1. 
Therefore for each i < n + 1, 

IPi(ri) -P2(ri)l < If(ri) - pi(ri)l + |/(ri) - P2(ri)l 
<2lq(ri) + lf(ri) -p(ri)l] 

< 2[C + C] 

= V. 


The Lagrange interpolant gives us an expression for the polynomial pi(x)— 
P2{x)-. 

n+1 


Pl{x) -P2{x) = -P2{ri)) n 


X — r, 


We may bound this: 

\Pi{x) -P2{x)\ = 


i=l j^i 

n+1 

(I’l ) -P2 (?’*)) n 

i=l 
n+1 


n - n 


X — r,- 


. ,. n — r,- 


< \Piin) -P2{ri)\ n 


X — To 


i=l 

n+1 


. n — r,' 


< 51 

i=l 

= nv'y~'^ 
= T 


and therefore 
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□ 

An additional merit of this proof is then we have obtained information 
on the uniformity of the bounds: we now know if that if someone gives us 
only (jjf and ||/||i, we can compute a modulus d> which will work for any 
function g for which w/ is a modulus of continuity and with H^Hi < ||/||i- 

4. The Functional Interpretation 

We have now seen that it was possible not only to prove an explicit quanti¬ 
tative version of Jackson’s Theorem, but to find such a proof closely related 
to the qualitative proof. Still, the arguments we used appeared to be some¬ 
what ad hoc, and we seem to have gotten lucky in various places. We made a 
string of fortunate guesses, for instance replacing Vn with Q„, or strengthen¬ 
ing the assumption of Lemma f3.4[ and therefore weakening the conclusion of 
Lemma 13.71 in a way which just so happened to allow us to finish the proof 
anyway. We also depended on existing arguments, like the Markov brothers’ 
inequality and the properties of the Lagrange interpolant, to complete our 
quantitative proof. 

Our goal in this section is to describe a formalization of the procedure 
used in the previous section which applies systematically to certain kinds 
of proofs. (Indeed, this method is precisely the way the arguments in the 
previous section were found by Kohlenbach and Oliva.) 

In order to make this precise, we need to pin down what counts as a 
quantitative calculation. For our purposes, we will identify “quantitative” 
with “computable”. Then our goal in this section is to describe how to 
systematically take a proof of a statement and: 

(1) Find a related statement for which it is appropriate to investigate 
the existence of computable bounds, 

(2) Convert the proof into a calculation of those computable bounds. 

As mentioned in the introduction, one of the things that makes the func¬ 
tional interpretation useful in practice is that we can take two different 
perspectives on it. In the first perspective, the functional interpretation is 
a completely formal idea; it refers to a family of theorems of the following 
form: 

Theorem 4.1. Suppose that there is a proof of (f in the formal theory T. 
Then there is a proof of cj)^^ in the formal theorem T'. 

Here T is some particular formal theory, T' is a theory related to T but 
with the additional property that proofs in T' are constructive, and 
is an operation mapping formulas of first-order logic to formulas in some 
particular nice form. The main reason we view these theorems as instances 
of a uniform idea is that the transformation (j) i—>■ (f>^^ is very similar across 
all these theorems. Such theorems have been proven for a variety of theories 
T. 
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The second perspective is that the functional interpretation is a heuristic: 
across many theories and situations, there is an operation which takes any 
formula (p (in particular, where there may be no computable bounds at all 
for (p) and transforms (p to (p^^ , a statement for which computable bounds 
do exist, which is reliably implication preserving —if we can conclude ip from 
(p then we can also conclude from cp^^ . Since we can convert individual 
mathematical statements into formulas (p without too much difficulty, we can 
also convert mathematical statements into the quantitative formula cp^^. 

Taking the first perspective would require choosing a suitable formal the¬ 
ory T, often the theory of Peano arithmetic or a variant, and carefully 
formalizing our statement in this theory. This tends to involve a great deal 
of tedious coding—one must interpret statements about the real numbers 
as statements about sequences of integers, statements about integration as 
formulas involving quantification over partitions, and so on. Nonetheless, 
the formal approach is often useful, and can give insights that the informal 
approach cannot. 

Here, however, we will take the second approach, and work only semi- 
formally. We will use the notation of first-order logic, but deal somewhat 
informally with our exact choice of formal language and theory. 

4.1. Formalizing Statements. The first thing we need to do is translate 
ordinary mathematical language into the more formal language of first-order 
logic. As promised, we will work semi-formally, writing formulas using quan¬ 
tifiers V, 3 and the connectives A (and), V (or), and ^ (implies), but without 
pinning down an exact language. In particular, we will freely write things 
like Mq G Q 3n G N... to indicate quantification over various particular 
domains. 

In order to get meaningful results, we do need some restrictions on the 
formulas we use. (These restrictions stand in for actually formalizing a 
statement in a particular fixed language—essentially, the restrictions we are 
choosing are the ones which ensure that a proper formalization is possible.) 
The most important restriction is that we only quantify over countable do¬ 
mains. Since Jackson’s theorem concerns notions like functions on the reals, 
this seems like a significant limitation. We will work around this by taking 
advantage of the fact that many of the uncountable domains we are inter¬ 
ested in (like the reals) are separable, and therefore many statements can 
be approximated by quantifying over the countable dense subset. 

We will write for Qn (0, oo), which is one such countable domain. We 
will write for the polynomials of degree at most n with rational coeffi¬ 
cients (which, of course, are dense in Vn under an appropriate topology). 

In addition, we need a restriction on our atomic formulas. The correct 
restriction is that our atomic formulas should represent only computable 
operations; making sense of that formally would require giving computable 
interpretations to things like real numbers and functions on real numbers. 
We will take a short-cut: our atomic formulas (p{x) (where x is one or more 
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free variables) will always have the form f(x) < e or f{x) < e where / is 
some continuous function. (This is closely related to continuous logic [5].) 

Having just imposed the requirement that we work with countable do¬ 
mains, we will immediately allow a single exception; we allow a single, out¬ 
ermost universal quantifier over an uncountable domain. That is, we work 
with formulas of the form 

yX eU (j){X) 

where U may be uncountable, but all quantifiers in cj) must be over countable 
domains H 

To illustrate the idea, let us consider how we translate the uniqueness 
part of Jackson’s theorem into a formula of the specified form. In English, 
the uniqueness part of Jackson’s theorem says 

Let / be a continuous Li-function on [0,1]. Then for any n, 
there is at most one best approximation to / in Vn- 

The collection of all continuous Li-functions is uncountable, but we can 
include / in the outermost uncountable quantifier. We will have to write 
formulas involving the values of x, so we want to be able to quantifier over 
the domain of /; we can take the outermost quantifier over since a 

continuous function is determined by its values at the rationals. 

We first need a formula cont(/), which should hold exactly when / is 
continuous. The usual e-5 formulation of continuity would be appropriate, 
but since [0,1] is compact, continuity is equivalent to uniform continuity, 
which will slightly simplify things later. So we take 

cont(/) = Ve G Q+3J € Q+Vx,y G Qn[0,1] {\x - y\ < 6 ^ \f{x) - f{y)\ < e) 
Saying / is Li is easy: 

Li(/) j\f{x)\dx < M. 

Note that we insist on expanding cont(/) into the e-5 form but are willing 
to treat / \f{x)\dx < M as a single statement, rather than writing it out in 
terms of a partition. 

We would like the conclusion to say that if pi,P 2 are best approximations 
of / then Pi = P 2 - But the real best approximation might not have rational 
coefficients, and we only want to quantify over V^. We therefore want to 
reformulate the statement that there is a best approximation in terms of 
•pQ 

' n • 

Suppose there were distinct best approximations; then there would be 
Pi,P 2 with IIpi — P 2 II 1 > e for some e. Since each of pi,P 2 are arbitrarily well 
approximated by elements of V^, we could find p'i,P 2 G with ||p( — Pi||i 
arbitrarily small—and therefore p[ a J-nearly best approximation for 5 as 

^This is actually a harmless modification. Formally, we could augment our language 
of first-order logic by a new predicate or function symbol, X. Since there are no defining 
axioms for X, proving a formula 0(X) is equivalent to proving WX € U </>(T). 
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small as we want—and with ||Pi — P 2 II 1 > Therefore we should state that 
this does not happen: 

approx(/) = Ve G Q"*" 35 G Q"*" G 'ip'2 G 

[\\p'i-p'2\\i>e^^p' 

(11/ -pill + < 11/ -Pilli 

V 11/ -pill + <5 < 11/ -P2l|l)]- 

Putting this all together, the uniqueness part of Jackson’s theorem is: 

(8) V/ G (cont(/) A Li(/) approx(/)). 

4.2. Extracting Quantitative Statements. Once we have placed a for¬ 
mula in the form 

Vx32/(/>(x,y) 

where x and y may be tuples of multiple variables, there is a natural way 
to identify a potential quantitative analog of this formula: replace the ex¬ 
istence of y with some calculated value —Vx g{x)) for some reasonable 

function g. In practice, computing exact values is often messy, so it usually 
enough to settle for some kind of bound: \/x3y G G{x) (j){x,y), where G{x) 
is always some sort of bounded (really, compact) set. In the simplest, but 
representative, case, y ranges over the natural numbers and G{x) always has 
the form [0, n], so G{x) really gives a bound on the size of y 

The functional interpretation depends on the fact that whether or not 
there is a computable g is closely related to the syntactic properties of 4>: if 
every quantifier in cj) is over a compact domain (and our formalization was 
appropriate) then there is guaranteed to be a computable g. Conversely, if 
there is a computable bound g then there must be some formula equivalent 
to g which contains only quantifiers over compact domains. (The one com¬ 
plication to this equivalence is that when (p has quantifiers over non-compact 
domains, there is no easy way to tell whether it is equivalent to a simpler 
formula). 

To see why this should be the case, consider the simplest situation: x and y 
are natural numbers and 0 is a formula of arithmetic, where the only domain 
being quantified over is the natural numbers. If all quantifiers in p are over 
compact subsets of the natural numbers then all quantifiers in p are really 
over finite sets. The atomic formulas should themselves be computable, and 
therefore there is a computer program which, given values n, m, checks in 
finite time whether or not (p{n, m) is true. If \lx3y(f){x, y) is true then g is 
a computable function: given the input n, g{n) first checks whether (p{n, 0) 
holds; if so, g{n) returns 0, and otherwise, g{n) checks whether 1) holds, 
if so returns 1, and if not continues similarly. (The standard encoding of 
computability in arithmetic gives the other half of a correspondence—any 
computable function can be converted into a formula of the right form.) 

When we deal with more complicated spaces, as in the previous subsection, 
we want to distinguish between compact and non-compact quantifiers even 
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while both are countable. For instance, we usually want quantifiers over 
Q to be non-compact while quantifiers over Q n [0,1] should be seen as 
compact. So in place of quantifiers over compact domains, we sometimes 
need quantifiers over countable dense subsets of compact domains. This 
could be ambiguous—every set is dense in some compact space (indeed, the 
one-point compactification adds only a single point); however a choice of 
topology is enforced by our choice of atomic formulas, which would not be 
continuous with respect to the wrong compactification. 

For our purposes, we adopt the following convention: we call a set effec¬ 
tively compact if it is a countable dense subset of a compact set (where the 
atomic formulas we are interested in come from functions which are contin¬ 
uous with respect to the same topology). In practice, the only effectively 
compact sets we are concerned with are obvious ones: Q n [a, b] and the set 
of polynomials of degree < n with coefficients from Q n [a,b]. These are 
clearly dense in the corresponding compact sets [a, b] and the polynomials 
of degree < n with coefficients from [a, 6], respectively. 

Note that effectively compact sets allow finite searches the same way finite 
sets did: suppose V is the underlying, uncountable, compact space and Q C 
P is a countable dense subset, and we want to check whether G P<('(c, x) 
holds for some fixed constants c € P. We can choose a finite set F C Q 
which is sufficiently dense (based on the moduli of uniform continuity of the 
atomic formulas in f) and check, for each d F, whether (j){c, d) holds. 

A formula in the form Vx3i/(/)(x, y) where f only has quantifiers over 
effectively compact domains is called a 112 formula. (The 11 indicates that 
the outermost quantifier is V and the 2 indicates that there are two blocks 
of quantifiers over non-compact domains.) 

Let’s consider what this means for Jackson’s Theorem. The formula ([8]) 
we found above isn’t in the 112 form yet. Recall that in the previous section 
we replaced the space Vn with the space Qn- Now we see what motivated 
this change: while the space Vn is not compact, Qn is compact; the corre¬ 
sponding is a countable set dense in the compact separable space Qn, 
so by replacing the quantifiers over with quantifiers over Q^, we bring 
Jackson’s theorem closer to the 112 form. In fact, with this change, the 
conclusion of the formula is now in the right form: 

Ve G Q+ € Q+ Vp{ G Q^ Vp'^ G Q® 

[\\Pi-P2\\i >e^ Ve 
(11/ -p'lli < 11/ -Kill 
V||/-p'||i + J<||/-Klli)]- 

We define the formula: 

approxQ(/, e, 6) = VK e Q® Vp 2 e Q^ 

[|IK-Klli>e^VeQ^ 

(11/ -p'lli + < 11/ -Kill 


22 


HENRY TOWSNER 


v\\f-p'\\i + s<\\f-p',M. 

To deal with the assumptions cont(/) and Li(/), we use a technique called 
Skolemization. The idea is to replace statements like 

\/x €Q3y £ Q4>{x,y) 

with the equivalent statement 

3Y € G Q y(x)). 

In other words, we can move existential quantifiers outwards by replacing 
them with functions. For example, we can rewrite cont(/) as: 

3a; € (Q+)®'^Ve G Q+Vx,y G Qn [0,g](|x - y\ < uj{e) |/(x) - f{y)\ < e). 

u} should look familar: this is just the statement that a; is a modulus of 
uniform continuity for /. We write 

ucont(/,a;) = Ve G Q+Vx,y G Qn [0,g](|x - y\ < a;(e) ^ |/(x) - f{y)\ < e). 

Both 35 G Q+approxQ(/, e, 5) and -'Ucont(/, a;) have the form 3y4>{y) where 
4){y) has only quantifiers over effectively compact domains. Then we can 
rewrite Jackson’s theorem as 

V/ G ]R'0n[0,i]v^^ g (Q+)^+VMVe G Q+[-ucont(/, a;)V 

J \f{x)\dx > MV 

3J G Q’''approxQ(/, e, J)]. 

So we have converted Jackson’s theorem into a 112 form. (Notice that Skolem¬ 
ization doesn’t trivialize the importance of the 112 form, because the 3 quan¬ 
tifier in the 112 form is still supposed to range over a countable domain.) 

Without going any further, we’ve already learned something: we expect 
there to be a computable function which, given /, a;/, M and e, computes 
a 5 such that approxQ(/, e, J). Actually, since we don’t use anything about 
/ other than ucont(/) and f |/(x)|dx < M, we don’t expect the bound to 
depend on /: in other words, we expect the modulus of uniqueness to depend 
only on ujf and M. (In fact, as discussed in [33], more careful examination 
of this statement shows that we can do a bit better: by replacing / by 
/(x) = /(x) — /(O), we can eliminate the dependence of f \f{x)\dx, so that 
bounds depend only on the modulus of continuity.) 

4.3. The Significance of Syntax. It is worth stating explicitly what the 
previous subsection implied: if a theorem can be put into the 112 form 
\/x3y(p{x,y), we expect there to be a computable function G so that G'(x) 
is always effectively compact and such that we can prove that Vx3y G 
G{x) 4>{x,y). Furthermore, as we will describe in the next subsection, we 
expect to be able to take a proof of the original statement and systemat¬ 
ically convert it into a particular choice of G and a proof of the bounded 
statement. 
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If a theorem cannot be put into the 112 form then this might not happen. 
It is possible that we can both prove \/x3y'iz(j){x, y, z) and also prove that 
the value of x does not suffice to give a computable bound on the value of 
y. Further, it may be very hard to determine whether this is the case. 

If we want to think about which proofs have constructive bounds and 
which don’t, we have to begin with the fact that there is a qualitative dif¬ 
ference between 112 statements and other statements. 

4.4. Proof Extraction. So suppose we have a proof of a 112 statement, as 
in Jackson’s Theorem. We have said that we expect there to be a computable 
bound, and we now turn to the question of extracting such a bound from a 
proof of the original statement. 

Let us begin by considering what we could do with completely formal 
proofs—that is, proofs which consist of a sequence of formulas, and with 
each step justified by some formal axiom or inference rule. (We won’t worry 
too much about the exact rules allowed; any reasonable choice will do.) If 
every formula in the proof were in the 112 form then we would expect the 
proof to directly provide an explicit algorithm: typically any 112 axiom will 
have an obvious computable bound, and standard inference rules all preserve 
the property of having explicit computable bounds. 

However many proofs of n 2 statements have intermediate steps which 
are not Il 2 . Jackson’s theorem is an example: the statement of Lemma 12.81 
is not n 2 . (Of course, this statement requires substantial massaging for it 
to even be meaningful to ask whether the statement is n 2 , and showing 
that a statement cannot, in any way, be rearranged into a n 2 form is much 
harder than showing that it can be. But the germ of the idea is given 
in our discussion before Lemma 15141 the bounds are discontinuous.) Such 
statements break the flow of explicit computations through our proof, and 
must usually be replaced by n 2 statements if we want to recover explicit 
bounds. 

We’ll focus on Lemma 12.81 since it was in the process of quantifying that 
lemma that we had to do the most work. We face some difficulty translating 
the assumption that g has at most n zeros into our format. A first attempt 
at a translation would look something like this: 

\/g, h [(3n3xi,..., x„Vy [g{y) = 0^3i<ny = Xi]) rest( 5 ', h)] 

where rest (s', h) is a formalization of “if h is continuous and / /isgn^f dx ^ 0 
then for some A, \\g — A/i||i < Hs'Hi”. Standard manipulations on first-order 
logic allow us to pull some of the quantifiers to the front: 

Vg, h,n,xi,..., Xn^y [{g{y) = 0 ^ 3i < n y = Xi) ^ rest( 5 , h)] . 

The xi,... ,Xn are real numbers which are part of the initial block of quanti¬ 
fiers over uncountable domains, but y is also an arbitrary real number. We 
need to reformulate this so that we only need to consider rational values of 
y; we can’t simply restrict the quantifier to rationals, though, since it could 
well be that the zero itself actually occurs at a real. 
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We need to make the statement more quantitative; the trick is to start in 
the right place. To say that y = Xi \s equivalent to saying that for every 5, 

\y — Xi\ < 6 and similarly for g{y) = 0. So we can rephrase this as: 

\/g,h,n,xi,.. .,Xn3y [([VC > 0 \g{y)\ < C] < n\f6 > 0 [?/ - Xi| < h) iest{g,h)] . 

We can now pull the quantifiers over C and 5 out to the front; we actually 
get to choose which quantifier will be the outermost, and we should make 
this choice with the goal of minimizing the alternatations of quantifiers, and 
therefore the ultimate complexity of the statement: 

\/g, h, n, xi,..., Xn3yVC > 035 > 0 [(| 5 '(y)| < ( ^ < n \y - Xi\ < 6) ^ Test{g, h)] . 

In this statement, it makes no difference whether we allow y to range over 
the real numbers or the rationals, so we choose the rationals to make this 
a sentence of the allowed kind. (This is really a reflection of the fact that 
we have chosen the correct atomic formulas—equality on reals is not com¬ 
putable, and therefore should not be treated as an atomic formula.) 

Since y now quantifies over an effectively compact domain (the rationals 
in the interval [0,1]), we can put it back inside the other quantifiers: 

Vg, h,n,xi,..., x„VC > 035 > 0 [(Vy \g{y)\ < ( ^ < n \y - Xi\ < 6) ^ T^est{g, h)] . 

To complete the process of finding the form of Lemma 13.41 we expand 
icesi{g,h). Initially, we might have: 


Vfir,/l,Wg,Wfen,Xi,...,XnVC > OV 7 > 0 


35 > 0^ (Vy \g{y)\ < C ^ < n \y 


Xi\ < 5) 


V -'UCont(y, cug) 

V -'UCont(h, cofi) 


V I J hsgng dx\ < 7 
V3A||y-Ah||i < ||y||i. 


As noted above, -'Ucont(/, w) has the 3y(f>{y) form, so this is a 112 formula. 

The actual Lemma (3.41 stated above includes some ad hoc simplification— 
replacing the particular witnesses xi,... ,Xn and 5 with the measurable set 
B = Ui<n(3^* — 5, Xi -|- 5), and replacing the continuity of g and h with 
just an Li bound on h. These all weaken the assumptions, and therefore 
strengthen the overall theorem; they are discovered by observing what prop¬ 
erties actually get used in the proof. (Note that while the proof of the 
resulting statement goes through with the weakened assumptions—in par¬ 
ticular, without the continuity of g —we actually used the continuity of g to 
derive the equivalence of the modihed statement with the original one.) 

Most importantly, the syntactic manipulations have guided us into dis¬ 
covering that we should replace counting zeros with a bound on those y 
with \g{y)\ < C- What the proof-theoretic methods tells us is that this is 
guaranteed to be the right thing to look at: both that we will be able to 
obtain computable bounds for this lemma by using such a restriction, and 
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that the rest of our proof will ultimately match up with this choice. In par¬ 
ticular, this is a local transformation; while proving a quantitative version 
of Lemma 12.81 we don’t need to look at the quantitative part of the other 
lemmas to know whether the proof will still work. 

4.5. More Complicated Sentences. In the case of Jackson’s Theorem, 
the statements of all our lemmas naturally unravel to 112 sentences, even if 
they didn’t start that way. In a more general proof (see below for references 
to some examples), that might not happen. For instance, an intermediate 
step of the proof might be to show a statement of the form 

\/x3y\/z(j){x, y, z) 

(if all quantifiers in (p are over effectively compact domains then this is called 
a Ila formula) and use this to prove 

Vu3v'ip{u, v). 

The proof might proceed something like this: 

Given u, calculate suitable values of x. Then there is a y 
such that z4>{x, y, z), and we can use this y to calculate a 
value V such that 'ip{u, v) holds. 

However, since there might be no way to compute y from x, this might not 
actually give us a computable algorithm. 

The essential idea of the functional interpretation is to replace the non- 
quantitative formula 'ix3y\/z(p{x, y, z) with a new quantitative analog which 
may be strictly weaker than the original version, but which is still sufficient 
to carry out the proof. This is best illustrated by an example at the purely 
syntactic level: we will replace 

Vx G X 3y € Y Vz G Z (p{x, y, z) 

with 

Vx G XVZ G Z^3y G YP{x,y, Z{y)). 

In other words, given x and a function Z, we can find a y which works, not 
necessarily for every z, but at least for the particular value Z{y) returned 
by the function. Ultimately we will need to restrict X to be a computable 
function, though Hs statements are simple enough that it does not matter. 

In fact, for statements this simple, the modified form is actually equivalent 
to the original: 

Lemma 4.2. Vx3yVz(^(x, y, z) holds 2 j(f VxVZ3y(/)(x, y, Z(y)) holds. 

Proof. The left to right direction is obvious: if Vx3yVz0(x, y, z) then, given 
X, we let y be the corresponding witness, and then Vz4>{x, y, z) holds, so in 
particular, (l){x, y, Z{y)). 

For the other direction, we show the contrapositive. Suppose 3x\/y3z->(j){x, y. 
Take some value x such that \/y3z-><p{x,y, z), and consider the function Z 
which, given y, choose a value Z{y) such that -^(p{x,y, Z{y)). Then x,Z 
form a counterexample to \/x\/Z3y<p{x, y, Z{y)). □ 
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Notice that in the left to right direction, our argument was constructive, 
in the sense that, given a value of y which worked on the left, we obtained 
a value which worked on the right. The right to left direction, however, was 
not constructive—it was a proof by contradiction. Knowing a particular 
value of y satisfying the formula of right, or even knowing a general method 
for finding, from each x and Z, a value of y is not enough to find the single 
value of y which works on the left. 

The statement 

^x'iZ^y4>{x,y,Z{y)) 

is n2, so we expect to be able to concretely calculate y from x and Z. More 
surprising is that when we have a proof of a 112 statement yv3v'ip{u, v) from 
\/x3y\/z(j){x, y, z), we also have a proof of\/u3vip{u, v) fiomVx\/ Z3y(j){x, y, Z{y)), 
and the bounds on v (as a function of u) depend only on the bounds on y as 
a function of x and Z. In other words, the statement VxVZ3y(/)(x, y, Z(y)) 
captures all the computable information present in the original statement. 

To handle even more complicated sentences, with yet more alternations of 
quantifiers, we need not only functions, but functionals —operations which 
map functions to functions, and then operations which map functionals to 
functionals, and so on. In order to keep ourselves to countable domains (and 
also to meet our goal of working with quantitative data), in general we need 
to restrict ourselves to computable functions. If the domains in the original 
statement are all countable (and coded appropriately) then it makes sense to 
talk about computable functions from a countable domain to another count¬ 
able domain, and there are only countably many such functions. (The outer 
quantifier over an uncountable domain becomes an oracle —we fix an object 
from an uncountable domain, but then all further discussion is computable 
relative to the use of that object fixed at the beginning.) 

To each formula 4> we will assign a new formula, which will always 
have the form 

3yVx <j)D{x,y), 

where will only have quantifiers over effectively compact domains. The 
intention is that we will have a systematic method for converting proofs of 
4> into a particular choice of y together with a proof that \/x(j)D{x,y) holds. 

The definitions of (pD and are given by induction on the form of (p. 

Remember that we are now restricting ourselves to computable functions, 
so when we write in the following definition, we mean the domain of 
computable functions from Y to X. 

When (p is atomic, cpo = (p- For the inductive case, suppose we already 
have (pD{x,y) and 'iPd{u,v) where x has type X, y has type Y, u has type 
U, and V has type V. Then 

(1) {(pA^p)D{x, u, y, v) is (Pd{x, y) A'0 d(w, v) and {(pAip)^^ = 3y, uVx, u 4 >d{x, y)A 
iPd{u,v), 

(2) {^(P)d{X, y) is ^(pD{X{y),y) and {^(p)^^ is G X^yy^(pD{X{y),y), 
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(3) {(j) ip)D{X,V,y,u) is (f>D{X{y,u),y) ^pD{u,V{y)) and {cj) 

is 

G € Y^yy,u{MX{y,u),y) ^ i^D{u,V{y)), 

(4) (Vz G S (P)d{x, Y, z) is 4>d{x-,Y (z), z) and (V 2 ; G S 4>)^^ is 3Y\/x, zcpoix, Y (z), z). 

Instead of defining cases for V and 3, we derive them using the de Morgan 
laws: {3z<p)^^ is {-Yz-'<p)^^ and {(py ip)^^ is {-^{-^cp f\ ^ip))^^. 

It is generally useful to think of (p^^ as saying “y is a demonstration 
that (p is true” where a purely mechanical verification that a statement is 
true would require checking that (pD{x,y) holds for all possible values of x. 

Then some of the inductive cases are easy to interpret: for instance, if y 
demonstrates (p holds and v demonstrates that ip holds then the pair v, y 
demonstrates p A ip. 

A more interesting case is implication; what does it mean to have an 
explicit demonstration for p ^ ipl It means we should have an algorithm 
which converts demonstrations of p into demonstrations of p. This is the 
function V: given a y which demonstrates p, V(y) is a demonstration of 
p. The complication is that when y fails to be a demonstration of p —there 
exists some x with ^pD{x,y )—we don’t want V{y) to be arbitrary. Instead 
we want to have the property that not only can we determine V{y) from y, 
but when u is a counter-example to V{y), we can find a counter-example 
X{y,u) to y. Because it is the most important case, it is worth dwelling on 
this point: the interpretation of ^ here requires that we have an algorithm 
which converts any y into a value V{y), without needing to know whether y 
actually works or not—the commitment is that if y works then V(y) works 
as well, and furthermore that we know how to translate a counter-example 
to V{y) into a counter-example to y itself. This imposes something like 
a continuity requirement on B: if y is “almost right”, in the sense that 
counter-examples are rare (for instance, the only counter-examples are very 
large values of x) then V{y) should be “almost right” as well. 

We would like to add the following clause which would cause effectively 
compact domains to behave more like finite sets: 

When S' is a effectively compact domain, (Vz G S P)d{x, Y, z) 
is 3y < y Pd{x, y, z) and (Vz G S p)^^ is BYVxVz G S3y < 

Y pD{x,y,z). 

This says, roughly, that when we quantify over a effectively compact domain, 
we obtain a bound on all witnesses needed for all elements of that domain. 

Making this idea work turns out to be a bit difficult—this is the subject of 
the monotone m and bounded |14j functional interpretations. However for 
most practical applications, the additional effort is justified. 

There are two properties that are needed to make the ND translation 
useful. The first is that if we have a proof of (/> in a reasonable theory (for 
instance, Peano arithmetic) then we have a particular value of y together 
with a proof that \/xpD{x,y) actually holds. To prove this, we could pin 
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down a particular formal system of axioms and inference rules, and then 
prove that the translation of each axiom is justified, and also that each 
inference rule preserves being justified. It is essential that the proof itself is 
completely explicit—given a formal proof cj), there is an explicit procedure 
for converting it into a proof of (j)^^ together with the associated algorithm. 
It is convenient that the algorithm is short: a single line in the original proof 
generally translates to a fixed, small number of lines in the new proof (the 
exact values, of course, depending on the particular formal system). 

The second important property of the ND translation is that it does not 
alter 112 formulas: 

Lemma 4.3. When (f> is quantifier-free, is equivalent to 

the existence of a computable function Y such that \/x(f{x,Y{x)). 

Proof. Since (j) has no quantifiers, is actually the formula (j) except that 
V has been replaced by -■(-'•• -A-i •••), which is equivalent by the de Morgan 
law. Then simply plugging in the definitions above, (Vx3?/(/)(x, is 

equivalent to 

JYM x(t){x ,Y {x)) 

where T is a computable functionl. □ 

This, of course, is exactly what we want: given a proof of \/x3yfi{x,y), 
we translate it, step by step, to an explicit proof of (Vx3y(/)(x, which 

means we have an actual function Y such that yx4>{x,Y{x)). However the 
intermediate steps of the proof have been changed to bring out explicit 
information which may have been hidden. 

If we used the monotone functional interpretation in place of the ND 
translation described above, we would get a stronger result; essentially 

Lemma 4.4. When (f only has quantifiers over effectively compact domains, 
(filx3y(f{x,y))^^ is equivalent to the existence of a computable function Y 
such that Vx3y < Y(x) (p{x, y). 

Making precise what y < Y (x) means when y comes from a domain of 
complicated functions, however, is a bit tricky. 

We’ve described this as a process applied to completely formal proofs. 
This is not a very practical approach; real proofs, as written in journals, are 
far from being strings of logical formulas. If we had to first translate those 
proofs into formal strings of logical inferences, that alone would be a huge 
process, and empirical experience suggests that the proof becomes several 
times as long when reduced to a completely formal proofl. What makes the 
functional interpretation useful for actual substantial proofs is that it can 
be applied relatively directly to journal proofs. 


^The ratio of length of the formalized proof to the informal one is known as the de 
Bruijn factor, and a value of 4 seems to be common m- 
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The functional interpretation is a local transformation: it tells us how to 
translate each individual statement. So we can translate particular statements- 
say, the statements of individual lemmas—and then fill in the proofs by hand, 
knowing that there is a proof (and one close to the original). If this proves 
too difficult, we can simply break the proof in half at some convenient point, 
translate the statement of the halfway point, and prove two shorter lemmas. 

We need the functional interpretation to complete our goal that extraction 
of bounds depends on the syntactic form of the conclusion. No matter what 
the intermediate steps look like, we can use the ND translation to convert 
every step of the proof into an argument with explicit quantitative bounds. 

5. Some Applications of the Functional Interpretation 

5.1. Fixed Point Theorems. One place where non-constructive proofs 
occur naturally is metric spaces (often special kinds of metric spaces, like 
Banach spaces or C'*-algebras), where compactness is a powerful, frequently 
used tool. The functional interpretation has been used extensively to extract 
quantitative information from such proofs [6lIin ( fT7 1 [25 ([28l[29( [3T ( l32 p37IHT] . 

We’ll consider just one family of examples, fixed point theorems, with an 
eye towards the importance of the syntactic form of statements. A typical 
fixed point theorem is Edelstein’s fixed point theorem [13| : 

Theorem 5.1. Let {K,d) be a compact metric space and f : K ^ K be 
contractive— for any x,y ^ K, d{f{x),f{y)) < d{x,y). Then for any x, the 
sequence given by xq = x, Xk+i = f{xk) converges to a unique c such that 
/(c) = c. 

In general, the statement that a sequence converges is not 112—it has the 
form 

Ve3mVn(n > m —>■ d{xn,Xn+i) < e). 

For Edelstein’s theorem, contractivity lets us make the following observa¬ 
tion: once we have d{xm, c) < e/2, we also have d{xn, c) < e/2 for all n>m, 
and in particular, d{xn,Xn+i) < e for n > m. If we know what c is, we see 
that any m with d{xm,c) < e/2 is an m we are looking for. The statement 

Ve3m d{xm, c) < e 

is n2, so we can expect to find explicit bounds for this. While we can’t 
expect to actually use c when finding bounds, the choice of c is unique, and 
it has an explicit modulus of uniqueness. By methods similar to the ones in 
the effective proof of Jackson’s theorem, Kohlenbach showed [28j that from 
the modulus of uniqueness, one can construct a modulus of continuity— 
that is, one can figure out how c varies as xq varies. Putting these facts 
together—the rate of convergence in the original statement depends on the 
convergence of Xm to c, where c is continuous in xq— is enough to find a 
computable function N{e) so that for each x, each e, and each n > N{e), 
d{xn,c) < e [26]. The function N depends on the diameter of K (that is. 
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s\\^x,y&K d{x,y)) and a modulus of contractivity of /—a function r]{e) such 
that d{x,y) > e implies d{f{x),f{y)) + r/(e) < d{x,y). 

To see that these considerations are really necessary, one can consider the 
Krasnoselski fixed point theorem: 

Theorem 5.2 ( [35]). Let K be a convex, closed and bounded set in a 
uniformly convex Banach space {X, || • ||), / a mapping of K into a compact 
subset of K such that f is non-expansive —that is, ||/(x) — f{y)\\ < ||a; —y|| 
for all x,y. Then for every xq E K, the sequence given by 

Xk + f{xk) 

Xk+l — 2 

converges to a p such that f{p) = p. 

Again this is a convergence statement, so not in a 112 form. Unlike Edel- 
stein’s fixed point theorem, this is really unavoidable—Kohlenbach gives an 
example |30] showing that there can be no algorithm finding bounds from xq 
and /. However a similar idea is to observe that the quantity \\xn — f{xn)\\ 
is decreasing, and therefore given e one can hope to find an n such that 
\\xn — fixn)\\ < e. (This is a bound on the asymptotic regularity of the se¬ 
quence (xn)-) Asymptotic regularity is n 2 , and it is therefore unsurprising 
that many explicit bounds are known, both by analytic methods [23(134] and 
by use of the functional interpretation [30] . 

5.2. Ultraproducts and Similar Constructions. Another common source 
of non-constructive proofs is the compactness of first-order logic—that is, 
nonstandard analysis, or, more generally, the use of ultraproduct^. We be¬ 
gin with a sequence of models TRat of a language L and combine them into 
a single model TR with the property that 9iR 1= 0 for a formula (f in C and 
only if for “most” N, OJItv 1= </'• If we prove that cj) must hold in iM, we can 
conclude that it holds in OJItv for “most”, and certainly for infinitely many 
N. (See [3] for discussion of the reverse argument, that ultraproducts can 
be used to show the existence of uniform bounds.) 

A typical example is the ergodic-theoretic proof of Szemeredi’s Theorem 

[laiSI: 

Theorem 5.3. For every e > 0 and every k, there is an N such that if 
A C [1,A^] with |A| > eN then there are a,d with 

{a,a + d,a + 2d,... ,a + {k — l)d} C A. 

Note that this is n 2 —for every e and k, there is an the quantifiers over 
A and over a, d are over finite sets. 

®This general idea has been rediscovered a number of times, especially various special 
cases that don’t depend as heavily on the general logical framework, and is therefore known 
by a number of names: Banach limits, the Furstenberg correspondence, vague convergence, 
and graphons. These are not all exactly the same notion, but when our concern is the 
extraction of computable bounds, the differences are not significant. 
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If the statement isn’t true, we may fix e > 0 and k, and for each N find an 
An C [1,A^] with I^Tvl > so that a contains no arithmetic progression 
of length k. We take the language L containing a unary relation symbol 
A, a unary function T, and for each formula 0(x) with only the displayed 
free variables and each q € a 0-ary relation symbol m^q. We interpret 
Ap^) as a model of this language by taking App as the interpretation 
of A, X I—>■ X + 1 mod 2N as the interpretation of T, and 

([1, A^],^Ar) N 7710,g |{x I ([l,iV], An) 1= (t){x)}\ > 2qN. 

(The last clause is technical; it gives us some ability to talk about the mea¬ 
sure of sets using formulas in the language. This is a special case of more 
general approaches for considering measures in the context of first-order 
logic [1111122].) 

We then work in the ultraproduct of these models, which we call (X,A). 
Observe that if we can prove, for some d, that the ultraproduct satisfies the 
formula 

3x X € A A T’^x € A A • • • A G A 

then infinitely many of the finite models ([1, 2A^], A^r) satisfy this formula. 
Take N much larger than d and let a G [1,2A^] witness this fact. Since 
An C [1, A^] and each G Ajv, we must have = a + id for each 
i < k. (We are going to a small amount of trouble here to prevent the case 
where the progression involves “wrapping around”, since T is interpreted 
as addition mod 2N.) Therefore a, a -|- d,..., a -|- (A: — l)d is an arithmetic 
progression in A^r. This gives the desired contradiction, since the An were 
chosen to be sets with no arithmetic progressions. 

Furstenberg gives a proof [11 that the ultraproduct satisfies this formula 
by way of interpreting the ultraproduct as dynamical system. (He phrases 
his construction quite differently, but the essential idea is the same.) 

If we want to give a constructive version of this proof, we face the following 
obstacle. The ergodic theoretic argument involves statements about the 
measure—say, /i(A n TA) > 0. Translated into a formula, this is 

3q G Q^°rnAxAA{Tx),q- 

We need to distinguish between formulas in the precise, formally defined 
language L and formulas in the informal sense we used them in the previous 
sentence. This formula, with its quantifier over is a formula in the 

broader sense, but it isn’t actually a formula in C. In particular, we don’t 
automatically have that (X, A) satisfies this formula exactly when most 
([l,2X],Ajv) satisfy this formula. 

There is an obvious attempt at a solution: we could extend T to a bigger 
language with two sorts, where the second sort is intended to represent 
Q>o. We would replace with m^{q), where q will obviously range over 
the second sort. Then we start with two-sorted models ([1, 2X], A^v, Q^*^). 
The problem is that when we take the ultraproduct, we also have to take 
the ultraproduct in the second sort, so we get a model {X, A, (Q^*^)*), where 
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is the (positive) nonstandard rational numbers—which includes in¬ 
finitesimal rationals. 

But the formula we want, 3q € is not the same as the 

formula 3q G {Q.^^)*'mAxAA{Tx){Q)] th® latter allows for the posibility that 
the measure of A n TA is “positive” but infinitesimal. 

The issue is that when we talk about the distinction between sets of 
positive measure and sets of measure 0, we want to work with the non¬ 
compact domain In an ultraproduct, all the domains in the language 

of the model are compact, so we can only discuss the distinction between 
positive and 0 measure by including quantifiers outside the language of the 
ultraproduct. 

However the transfer principle tells us that we have the following corre¬ 
spondence: 

Lemma 5.4. Consider a statement in the form Vx G D3y G S(p{x, y) where 
D is arbitrary, S is countable, and (j) is a statement in the language C. Then 
fUt 1= \/x3yf){x, y) iff for every x G D there is a y G S such that for most N, 
TIn b (l>{x,y). 

In other words, we have a correspondence between II 2 sentences in fUI and 
those in the OJIat, namely that II 2 statements which are uniformly true in 
the finite models are true in the infinite one. 

For statements which are not II 2 , we do not have such a correspondence. 
For example, the ergodic proof of Szemeredi’s Theorem uses the mean er- 
godic theorem, the relevant case of which is the following statement: 

Let an{x) = ^Ei<nXA(x-bn) andlet ||a(x)|| = 

For every e > 0, there is an n such that for all m > n 
\\an{x) - am{x)\\ < e. 

We may treat the inner part of the statement— \\an{x) — am{x)\\ < e —as 
being a formula in C (working this out in detail requires expanding the 
language and getting into technicalities about representing the measure as 
part of C). Even so, this isn’t n 2 , since we have three quantifiers over 
countable domains on the outside. 

In fact, this statement is true in the finite models, for utterly trivial 
reasons—in ([1,2A^], Htv), take n to be much, much larger than N, so that 
for any m > n, am{x) is the function which is very close to ^ J2i<N XAii) 
at every point. This is a vacuous argument, and it doesn’t reflect the real 
mathematical content of the mean ergodic theorem. We can’t expect to 
express the full content of saying that an infinite sequence converges in a 
finite model. 

The functional interpretation tells us that every statement implies a n 2 
statement which captures its computational content. The right statement 
here is 

For every e > 0 and every function F, there is an n such that 
if F{n) > n then ||a„ — < e. 
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This is also true in the finite models, with a bound on n that depends on e 
and T, but not on the size of the finite model. This is the statement about 
finite models which matches up correctly with the mean ergodic theorem in 
the ultraproduct. 

We can make this statement slightly nicer with the following observation: 
given a function F, define F' by 


. f The least m € [n,T(n)l such that \\an — ocmW > e 

= 1 Fin) 


if there is one 
otherwise 


Then F' is computable from F, and by appling the statement above to F' 
instead of F, we obtain 

For every e > 0 and every function F, there is an n such that 
if F{n) > n then for all m G [n,F{n)], \\an — oim\\ < £• 

The usual mean ergodic theorem tells us that the averages an stabilize. 
This version tells us that we can find arbitrarily long intervals on which the 
average remains stable. This is known as the metastability of the average 

[aiiiiEo]. 

A recent generalization of this idea is in [l6], where the corresponding 
notion of metastability for a double limit is used to give an effective version 
of a theorem about functions; this is applied in [35] to give explicit bounds 
for a theorem about Banach spaces—the “failure of local unconditionality 
of the James space”—whose usual proof involves an ultraproduct. 

Note that in this case it’s incidental that what the functional interpre¬ 
tations extracts from proofs which used ultraproducts is computable—the 
same method would work with any language and any replacement for our 
finite models. What’s really important is that the functional interpretation 
tells us how to correspond statements about the ultraproduct model to state¬ 
ments in the original models. In almost every situation where we care about 
this, it makes sense to view us as extracting information computable from 
the original models, but there has been some recent investigation of this 
idea more abstractly, where one has the right syntactic features to extract 
information using the functional interpretation, but where one need not be 
working with computable information [2011471135] 


6. Further Reading 

There are two well-established and complementary introductions to the 
functional interpretation. Avigad and Feferman [T] give a thorough intro¬ 
duction to the formal theory of the functional interpretation in the most 
important settings, including a number of variants and many applications 
within proof theory. Kohlenbach [23] gives a detailed guide to the applica¬ 
tions of the functional interpretation outside logic, especially in analysis. 
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